Speed up _is_eol_token #1256

correctmost · 2024-08-10T01:21:08Z

This change provides a small speed-up on large codebases (~200ms).

Stats

Before

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.940    0.000    1.245    0.000 pycodestyle.py:1831(_is_eol_token)
  1472360    0.359    0.000    0.359    0.000 {method 'lstrip' of 'str' objects}

Command	Mean [s]	Min [s]	Max [s]	Relative
`pycodestyle .`	18.349 ± 0.161	18.119	18.641	1.00

After

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.458    0.000    0.459    0.000 pycodestyle.py:1831(_is_eol_token)
   225341    0.057    0.000    0.057    0.000 {method 'lstrip' of 'str' objects}

Command	Mean [s]	Min [s]	Max [s]	Relative
`pycodestyle .`	18.145 ± 0.098	17.997	18.283	1.00

Set-up

I profiled pycodestyle with the yt-dlp codebase because it is similar in composition to a private codebase I have.

git clone https://github.com/yt-dlp/yt-dlp.git
cd yt-dlp

git checkout ef36d517f9b05785d61abca7691d9ab7d63cc75c

# Callgraph command
python -m cProfile -o stats $(which pycodestyle)

# Benchmarking command
hyperfine --ignore-failure --warmup 2 --runs 15 --export-markdown=baseline.md 'pycodestyle .'

setup.cfg

[pycodestyle]
ignore = W504
max-line-length = 100

Baseline version: c8e36a0

correctmost · 2024-08-10T01:31:51Z

pycodestyle.py

+
+    # Check if the line's penultimate character is a continuation
+    # character
+    if token[4][-2] != '\\':


Based on local testing with Python 3.12.4, I had assumed that the string length would always be >=2 here.

It looks like that is not the case with all supported versions of Python.

correctmost · 2024-08-10T01:33:51Z

Closing because this approach doesn't work for all Python versions.

correctmost · 2024-08-10T02:54:47Z

I updated the patch and re-ran the benchmarks.

The additional tokenize.ENDMARKER check seems to have reduced the savings from ~200ms to ~100ms.

If this PR seems too risky because of assumptions about tokenization, feel free to pass on it :).

Stats

Before

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.885    0.000    1.197    0.000 pycodestyle.py:1831(_is_eol_token)
  1472360    0.364    0.000    0.364    0.000 {method 'lstrip' of 'str' objects}

Command	Mean [s]	Min [s]	Max [s]	Relative
`pycodestyle .`	18.472 ± 0.196	18.067	18.848	1.00

After

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.511    0.000    0.511    0.000 pycodestyle.py:1831(_is_eol_token)
   225341    0.055    0.000    0.055    0.000 {method 'lstrip' of 'str' objects}

Command	Mean [s]	Min [s]	Max [s]	Relative
`pycodestyle .`	18.360 ± 0.186	18.159	18.781	1.00

Speed up _is_eol_token

d3118d4

correctmost force-pushed the cm/speed-up-eol-token-check branch from f733364 to d3118d4 Compare August 10, 2024 01:24

correctmost commented Aug 10, 2024

View reviewed changes

correctmost closed this Aug 10, 2024

correctmost mentioned this pull request Aug 10, 2024

Speed up _is_eol_token #1257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up _is_eol_token #1256

Speed up _is_eol_token #1256

correctmost commented Aug 10, 2024

correctmost Aug 10, 2024

correctmost commented Aug 10, 2024

correctmost commented Aug 10, 2024

Speed up _is_eol_token #1256

Speed up _is_eol_token #1256

Conversation

correctmost commented Aug 10, 2024

Stats

Before

After

Set-up

correctmost Aug 10, 2024

Choose a reason for hiding this comment

correctmost commented Aug 10, 2024

correctmost commented Aug 10, 2024

Stats

Before

After